asdasd

9th of 9 Questions.

You are designing a system for an e-commerce platform. A 'Product' has 50 attributes, but only 5 are used for the search results page. The 'Reviews' can grow into the thousands. How do you model this?

Use a hybrid approach: store full product attributes in a main collection, create a separate collection for search-focused product summaries, and store reviews in their own collection with references to products.

For this e-commerce scenario, a single-collection approach would be problematic. Storing all 50 attributes in one document is fine, but including thousands of reviews would quickly exceed the 16MB document limit and make search queries slow. The optimal solution combines multiple MongoDB design patterns: store the full product in one collection, create a denormalized search collection for the 5 frequently queried attributes, and keep reviews in a separate collection with proper indexing.

Complete Schema Design

Design Rationale

Product collection: Stores the complete authoritative data. This is the source of truth with all 50 attributes, properly indexed for product detail pages and admin operations.
Search collection: Denormalized specifically for search performance. Contains only the 5 fields needed for search results, enabling fast queries without loading large product documents. Can be updated via change streams or batch jobs when product data changes.
Reviews collection: Separate collection prevents the unbounded array problem. With potentially thousands of reviews per product, embedding would exceed 16MB. References allow efficient pagination and queries like "most recent reviews" or "highest rated".

For the search results page, having a dedicated search collection with only the necessary fields provides dramatic performance benefits. Queries can use a compound index on category, price, and brand to return results quickly. MongoDB can even cover the query entirely from the index if all returned fields are in the index, eliminating document fetches. This approach is superior to trying to optimize queries against the full product collection, where 45 unnecessary fields would be loaded into memory for every search result.

Storing reviews separately solves multiple problems. First, it eliminates the 16MB limit concern—products can have unlimited reviews. Second, it enables efficient pagination: you can fetch 10 reviews at a time with skip() and limit(). Third, you can create separate indexes for different review queries, like finding all reviews by a user or calculating average ratings. The trade-off is that retrieving a product with its reviews requires either two queries or a $lookup aggregation, but the performance benefits of keeping reviews separate far outweigh this cost.

Alternative Approaches Considered

Embedding reviews with subset pattern: Embed only the 5 most recent reviews in the product document for the detail page, while keeping full history in a separate collection. This balances detail page performance with unbounded growth prevention.
Denormalized reviews in search collection: If search results need to show review counts or average ratings, denormalize these aggregates into the search collection and update them periodically.
Computed review aggregates: Store pre-computed values like reviewCount and averageRating in the product document, updated via background jobs when new reviews arrive.

Question Loading...

asdasd

9th of 9 Questions.

Complete Schema Design

Design Rationale

Product collection: Stores the complete authoritative data. This is the source of truth with all 50 attributes, properly indexed for product detail pages and admin operations.
Search collection: Denormalized specifically for search performance. Contains only the 5 fields needed for search results, enabling fast queries without loading large product documents. Can be updated via change streams or batch jobs when product data changes.
Reviews collection: Separate collection prevents the unbounded array problem. With potentially thousands of reviews per product, embedding would exceed 16MB. References allow efficient pagination and queries like "most recent reviews" or "highest rated".

Alternative Approaches Considered

Embedding reviews with subset pattern: Embed only the 5 most recent reviews in the product document for the detail page, while keeping full history in a separate collection. This balances detail page performance with unbounded growth prevention.
Denormalized reviews in search collection: If search results need to show review counts or average ratings, denormalize these aggregates into the search collection and update them periodically.
Computed review aggregates: Store pre-computed values like reviewCount and averageRating in the product document, updated via background jobs when new reviews arrive.